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Abstract 

Representations of Boolean functions by real polynomials play an im- 
portant role in complexity theory. Typically, one is interested in the least 
degree of a polynomial p{x\, . . . , x„) that approximates or sign-represents a 
given Boolean function f{x\, . . . ,x„). This article surveys a new and grow- 
ing body of work in communication complexity that centers around the dual 
objects, i.e., polynomials that certify the difficulty of approximating or sign- 
representing a given function. We provide a unified guide to the following 
results, complete with all the key proofs: 

• Sherstov's Degree/Discrepancy Theorem, which translates lower 
bounds on the threshold degree of a Boolean function into upper 
bounds on the discrepancy of a related function; 

• Two different methods for proving lower bounds on bounded-error 
communication based on the approximate degree: Sherstov's pattern 
matrix method and Shi and Zhu's block composition method; 

• Extension of the pattern matrix method to the multiparty model, ob- 
tained by Lee and Shraibman and by Chattopadhyay and Ada, and the 
resulting improved lower bounds for disjointness; 

• David and Pitassi's separation of NP and BPP in multiparty communi- 
cation complexity for A: < (1 - e) logn players. 



*Invited survey for The Bulletin of the European Association for Theoretical Computer Science 
(EATCS). 



1 Introduction 



Representations of Boolean functions by real polynomials are of considerable im- 
portance in complexity theory. The ease or difficulty of representing a given 
Boolean function by polynomials from a given set often yields valuable insights 
into the structural complexity of that function. 

We focus on two concrete representation schemes that involve polynomi- 
als. The first of these corresponds to threshold computation. For a Boolean 
function / : {0,1}" — > |0, 1), its threshold degree deg+(/) is the minimum 
degree of a polynomial p{xi, . . . ,Xn) such that p(x) is positive if f{x) = 1 
and negative otherwise. In other words, the threshold degree of / is the least 
degree of a polynomial that represents / in sign. Several authors have ana- 
lyzed the threshold degree of common Boolean functions IIMP88 [ IBRS951 IOS03II . 
The results of these investigations have found numerous applications to circuit 
complexity ||ABFR94: 'BRS95I |KP97[ IKP98I1 and computational learning the- 
ory |[KS04llKOS0 4, KS07b ,l. 

The other representation scheme that we consider is approximation in the 
uniform norm. For a Boolean function / : {0,1}" {0,1} and a constant 
e € (0, 1/2), the 6-approximate degree of / is the least degree of a polynomial 
p{xi, . . . , Xn) with |/(x) - p{x)\ < 6 for all x € {0,1}". Note that this repre- 
sentation is strictly stronger than the first: no longer are we content with rep- 
resenting / in sign, but rather we wish to closely approximate / on every in- 
put. There is a considerable literature on the approximate degree of specific 
Boolean functions [NS92. Pat92l IKLS961 IBCWZ99. .AS04. She08> fWoMll . This 
classical notion has been crucial to progress on a variety of questions, includ- 
ing quantum query complexity IIBCWZ991 iBBC^Oll IAS04L communication com- 
plexity rBW01[ |gaz03llBVW07ll and computational learning theory IITT99llKS04l 
lKKMS05 . KS07a]l . 

The approximate degree and threshold degree can be conveniently analyzed by 
means of a linear program. In particular, whenever a given function / cannot be ap- 
proximated or sign-represented by polynomials of low degree, linear-programming 
duality implies the existence of a certain dual object to witness that fact. This dual 
object, which is a real function or a probability distribution, reveals useful new 
information about the structural complexity of /. The purpose of this article is to 
survey a very recent and growing body of work in communication complexity that 
revolves around the dual formulations of the approximate degree and threshold 
degree. Our ambition here is to provide a unified view of these diverse results, 
complete with all the key proofs, and thereby to encourage further inquiry into the 
potential of the dual approach. 

In the remainder of this section, we give an intuitive overview of our survey. 
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Degree/Discrepancy Theorem. The first result that we survey, in Section [3l is 
the author's Degree/Discrepancy Theorem USheOTai This theorem and its proof 
technique are the foundation for much of the subsequent work surveyed in this 
article IIShe07bi IChaOTl iLSOTl ICAOSl IDP08 J . Fix a Boolean function / : {0, 1)" ^ 
{0, 1} and let Nbea. given integer, N > n.ln USheOTaL we introduced the two-party 
communication problem of computing 

fix\v), 

where the Boolean string x € {0, 1)^ is Alice's input and the set V cz {1,2, ... ,N} 
of size \V\ = n is Bob's input. The symbol x\v stands for the projection of x onto 
the indices in V, in other words, x\v = ixi^ ,Xi^,..., XjJ e {0, 1)", where ii < 12 < 
■ ■ ■ < in are the elements of V. Intuitively, this problem models a situation when 
Alice and Bob's joint computation depends on only n of the inputs xi, X2, . . . , x^. 
Alice knows the values of all the inputs xi,X2, ■ ■ ■ , xn but does not know which n 
of them are relevant. Bob, on the other hand, knows which n inputs are relevant 
but does not know their values. 

We proved in BSheOTall that the threshold degree J of / is a lower bound 
on the communication requirements of this problem. More precisely, the De- 
gree/Discrepancy Theorem shows that this communication problem has discrep- 
ancy exp(-Q(J)) as soon as N ^ llri^/d. This exponentially small discrepancy 
immediately gives an lower bound on communication in a variety of models 
(deterministic, nondeterministic, randomized, quantum with and without entangle- 
ment). Moreover, the resulting lower bounds on communication hold even if the 
desired error probability is vanishingly close to 1/2. 

The proof of the Degree/Discrepancy Theorem introduces a novel technique 
based on the dual formulation of the threshold degree. In fact, it appears to be the 
first use of the threshold degree (in its primal or dual form) to prove communication 
lower bounds. As an application, we exhibit in l,She07 a I the first AC" circuit with 
exponentially small discrepancy, thereby separating AC° from depth-2 majority 
circuits and solving an open problem of Krause and Pudlak IIKP971 §6]. Indepen- 
dently of the author, Buhrman et al. [iBVW07i exhibited another AC" function with 
exponentially small discrepancy, using much different techniques. 

Bounded-Error Communication. Next, we present two recent results on 
bounded-error communication complexity, due to Sherstov [SheOVb] and Shi and 
Zhu r SZ07ll . These papers use the notion of approximate degree to contribute 
strong lower bounds for rather broad classes of functions, subsuming Razborov's 
breakthrough work on symmetric predicates IIRaz03L The lower bounds are valid 
not only in the randomized model, but also in the quantum model with and without 
prior entanglement. 
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The setting in which to view these two works is the generalized discrepancy 
method, a simple but very useful principle introduced by Klauck MKlaOll and refor- 
mulated in its current form by Razborov [ Raz03ll . Let f{x, y) be a Boolean function 
whose quantum communication complexity is of interest. The method asks for a 
Boolean function h{x, y) and a distribution ju on (x, i/)-pairs such that: 

(1) the functions / and h are highly correlated under ju; and 

(2) all low-cost protocols have negligible advantage in computing h under ju. 

If such h and fi indeed exist, it follows that no low-cost protocol can compute / to 
high accuracy (or else it would be a good predictor for the hard function h as well!). 
This method is in no way restricted to the quantum model but, rather, applies to 
any model of communication IIShe07b[ §2.4]. The importance of the generalized 
discrepancy method is that it makes it possible, in theory, to prove lower bounds 
for functions such as disjointness, to which the traditional discrepancy method 
does not apply. In Section HI we provide detailed historical background on the 
generalized discrepancy method and compile its quantitative versions for several 
models. 

The hard part, of course, is finding h and ju. Except in rather restricted 
cases IKlaOli Thm. 4], it was not known how to do it. As a result, the generalized 
discrepancy method was of limited practical use. This difficulty was overcome 
independently by Sherstov IIShe07bl and Shi and Zhu IISZ07L who used the dual 
characterization of the approximate degree to obtain h and /i for a broad range of 
problems. To our knowledge, the work in [She07bl and ISZ07 I is the first use of 
the dual characterization of the approximate degree to prove communication lower 
bounds. The specifics of these two works are very different. The construction of h 
and fi in fShe07b], which we called the pattern matrix method for lower bounds on 
bounded-error communication, is built around a new matrix-analytic technique (the 
pattern matrix) inspired by the author's Degree/Discrepancy Theorem. The con- 
struction in IISZ07L the block-composition method, is based on the idea of hardness 
amplification by composition. These two methods exhibit quite different behavior, 
e.g., the pattern matrix method further extends to the multiparty model. We present 
the two methods individually in Sections 15.11 and 15.21 and provide a detailed com- 
parison of their strength and applicabiUty in Section [531 

Extensions to the Multiparty Model. Both the Degree/Discrepancy Theo- 
rem IIShe07al and the pattern matrix method IIShe07bll generalize to the multiparty 
number-on-the-forehead model. In the case of IIShe07all . this extension was for- 
malized by Chattopadhyay [ :Cha07ll . As before, let / : {0, 1)" ^ {0, 1} be a given 
function. Recall that in the two-party case, there was a Boolean string x e {0, 1)^ 
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and a single set V c {\,2, . . . ,N]. The k-party communication problem features 
a Boolean string x e |0, 1) and sets Vi, . . ., Vk-i c {1,2, ... ,A'^). The k inputs 
x,Vi,. . ., Vk-\ are distributed among the k parties as usual. The goal is to compute 

f{x\vu-,Vk-i) '^^^ f (%.../-''■■■' (1.1) 

where /-J < < • • • < //, are the elements of Vj (for j = 1, 2, . . . , ^ - 1). This 
way, again no party knows at once the Boolean string x and the relevant bits in 
it. With this setup in place, it becomes relatively straightforward to bound the 
discrepancy by traversing the same line of reasoning as in USheOTai The extension 
of the pattern matrix method USheOTbll to the multiparty model uses a similar setup 
and was done by Lee and Shraibman fLS07 | and independently by Chattopadhyay 
and Ada [CA08 |. We present the proofs of these extensions in Section |6l placing 
them in close correspondence with the two-party case. These extensions do not 
subsume the two-party results, however (see Section [6]for details). 

The authors of IILS07II and IICA0 81 gave important applications of their work 
to the ^-party randomized communication complexity of disjointness, improving 
it from n(|log?i) to ?i^(i/*^)2~'^'^^*\ As a corollary, they separated the multiparty 
communication classes NP^'^ and BPP^'^ for ^ = (1 - o(l)) log2 log2 n parties. They 
also obtained new results for Lovasz-Schrijver proof systems, in light of the work 
due to Beame, Pitassi, and Segerlind [BPS07J. 

Separation of NPf and BPP^'^. The separation of the classes NP^'' and BPP^"^ 
in IILS071ICA08II for ^ = (1 - o(l)) log 2 log2 n parties was followed by another ex- 
citing development, due to David and Pitassi fDPOSl, who separated these classes 
for /: < (1 - e) log2 n parties. Here e > is an arbitrary constant. Since the current 
barrier for explicit lower bounds on multiparty communication complexity is pre- 
cisely k = log2 n, David and Pitassi's separation matches the state of the art. We 
present this work in Section |7] The powerful idea in this result was to redefine 
the projection operator x\vi,...,Vii-i in (ll ll >- Specifically, David and Pitassi observed 
that it suffices to define the projection operator at random, using the probabilistic 
method. This insight removed the key technical obstacle present in IILS07[|CA08P . 
In a follow-up work by David, Pitassi, and Viola IIDPV08II . the probabilistic con- 
struction was derandomized to yield an explicit separation. 

Other Related Work. For completeness, we will mention several duality-based 
results in communication complexity that fall outside the scope of this survey. Re- 
cent work has seen other applications of dual polynomials iShe07cllRS08ll . which 
are considerably more compHcated and no longer correspond to the approximate 
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degree or threshol d degree. More broadly, several recent results feature other forms 
of duality ULSOTbllLSSOSi . such as the duality of norms or semidefinite program- 
ming duality. 



2 Preliminaries 

This section reviews our notation and provides relevant technical background. 
2.1 General Background 

A Boolean function is a mapping X — > {0, 1}, where X is a finite set such as X = 
{0, 1)" orX = {0, 1)" X {0, 1)". The notation [n] stands for the set { 1, 2, . . . , «). For 
integers A^, n with N ^ n, the symbol {^^^^ denotes the family of all size-n subsets 
of {1,2, ... ,N]. For X € {0,1}", we write \x\ = xi + ■ ■ ■ + Xn- For x,y e {0, 1)", the 
notation xhy refers as usual to the component-wise AND of x and y. In particular, 
\x A y\ stands for the number of positions where x and y both have a 1. Throughout 
this manuscript, "log" refers to the logarithm to base 2. 

For tensors A, B : Xi x • • • x X/. ^ R (where X,- is a finite set, / - \,2,...,k), 
define {A,B) = Z(:c,,...,xt)eXix- -xx.^C-^i' ■ ■ ■,Xk)B{x\, . . .,Xk). When A and B are 
vectors or matrices, this is the standard definition of inner product. The Hadamard 
product of A and B is the tensor A o B : X\ x ■ ■ ■ x X^ K. given by 
(A o B){xi ,...,Xk)= A{xi Xk)B{xi, Xk). 

The symbol R*"^" refers to the family of all mx n matrices with real entries. 
The (/, 7)th entry of a matrix A is denoted by A,y. We frequently use "generic-entry" 
notation to specify a matrix succinctly: we write A = [F{i,j)]ij to mean that the 
(/, j)th entry of A is given by the expression F{i,j). In most matrices that arise in 
this work, the exact ordering of the columns (and rows) is irrelevant. In such cases 
we describe a matrix by the notation [F{i, j)]iei,jej, where / and J are some index 
sets. 

Let A e R"'^". We use the following standard notation: ||A||oo = max,- y |A;y| 
and ||A||i = Yjijl^ijl- We denote the singular values of A by cri{A) > cr2(A) ^ 
... ^ crmin{m,n)(A) ^ 0. Recall that the spectral norm of A is given by ||A|| = 
maXxeWL", ||x||=i = cri(A). An excellent reference on matrix analysis is IIHJ86II . 

We conclude with a review of the Fourier transform over Consider the 
vector space of functions {0, 1)" — > R, equipped with the inner product {f,g} = 
2-"i:.e|0,i)"/W(?W- For 5 c [n], deMe xs : 10,1)" ^ {-I, +1} by xsix) - 
(_l)E,e.s V, jj^ej^ {;t5)5c[,j] is an orthonormal basis for the inner product space in 
question. As a result, every function / : {0, 1)" ^ R has a unique representation of 
the form f{x) ^ T.SQ[n] f{S)xs i^), where f{S ) = {f,xs >■ The reals f{S ) are called 
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the Fourier coefficients off. The following fact is immediate from the definition of 



2.2 Communication Complexity 

This survey features several standard models of communication. In the case of two 
communicating parties, one considers a function / : X x 7 — > {0, 1), where X and 
Y are some finite sets. Alice receives an input x e X, Bob receives y € Y, and their 
objective is to predict f{x, y) with good accuracy. To this end, Alice and Bob share 
a communication channel (classical or quantum, depending on the model). Alice 
and Bob's communication protocol is said to have error e if it outputs the correct 
answer f{x, y) with probability at least 1 - e on every input. The cost of a given 
protocol is the maximum number of bits exchanged on any input. The two-party 
models of interest to us are the randomized model, the quantum model without 
prior entanglement, and the quantum model with prior entanglement. The least cost 
of an e-error protocol for / in these models is denoted by R^if), Qe{f), and QKf), 
respectively. It is standard practice to omit the subscript e when error parameter 
is 6 = 1/3. Recall that the error probability of a protocol can be decreased from 
1 / 3 to any other constant e > at the expense of increasing the communication 
cost by a constant factor; we will use this fact in many proofs of this survey, often 
without explicitly mentioning it. Excellent references on these communication 
models are IIKN97II and HWoIOIL 

A generalization of two-party communication is number-on-the-forehead mul- 
tiparty communication. Here one considers a function / : Xi x • • • x X,t — > {0, 1) 
for some finite sets Xi , . . . , X/t. There are k players. A given input {x\,. . . ,Xk) £ 
Xi X • • • xX^ is distributed among the players by placing xi on the forehead of player 
/ (for / = \,. . . ,k). In other words, player / knows x\,. . ., Xi-\, Xi+\, . . . , but not 
Xi. The players can communicate by writing bits on a shared blackboard, visible to 
all. They additionally have access to a shared source of random bits. Their goal is 
to devise a communication protocol that will allow them to accurately predict the 
value of / on every input. Analogous to the two-party case, the randomized com- 
munication complexity R^if) is the least cost of an e-error communication protocol 
for / in this model. The final section of this paper also considers the nondetennin- 
istic communication complexity N'^if), which is the minimum cost of a protocol 



RS): 



Proposition 2.1. Fix f : |0, 1)" ^ R. Then 
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for / that always outputs the correct answer on the inputs / ' (0) and has error prob- 
ability less than 1 on each of the inputs /'^(l). Analogous to computational com- 
plexity, BPP^'' (respectively, NP^O is the class of functions / : ({0, 1}")*^ {0, 1) 
with R\f) < (lognf^^'' (respectively, N\f) < {lognf^^''). See IIKN97II for further 
details. 

A crucial tool for proving communication lower bounds is the discrepancy 
method. Given a function / : X x Y {0, 1) and a distribution // on X x F, 
the discrepancy off with respect to p is defined as 



discp(/) = max 

SQX, 
TQY 



LB- 

xeS yeT 



This definition generalizes to the multiparty case as follows. Fix f : X[X- ■ -xXi^ ^ 
{0, 1} and a distribution yu on Xi x • • • x X^. The discrepancy off with respect to p 
is defined as 



disc„(/) 



max 

<Pl,—,<l>k 



^ t//{Xi, ...,Xk)Y\ 4>ii^l,- ■ ■ > X;_i, X,-+i, . ..,Xk) 



{Xl,...,XM) 

eXiX--xXi, 



1=1 



where (A(xi, . . . ,Xk) = (-1)^^'^'' ■■■"'^*V(xi, . . . ,Xk) and the maximum ranges over all 
functions 0,- : Xi x ■ ■ ■ x x ■■■Xk ^ {0, 1), for / = l,2,...,k. Note that 
for k - 2, this definition is identical to the one given previously for the two-party 
model. We put disc(/) = min^ discp(/). We identify a function / : Xi x • • • x Xj. ^ 
{0, 1) with its communication tensor M{x\, . . .,Xk) = (-l)/^-^'- and speak of 
the discrepancy of M and / interchangeably (and likewise for other complexity 
measures, such as R^if)). 

Discrepancy is difficult to analyze as defined. Typically, one uses the following 
well-known estimate, derived by repeated applications of the Cauchy-Schwartz 
inequality. 



Theorem 2.2 ([BNS921 ICT931 IRazOOl ). Fix f : Xi x • • • x X^ ^ {0, 1) and a 

distribution p onX\X - ■ ■ xXj.. Put i//(xi, . . .,xic) = (-l)^^'^'- ■■'^*^//(xi, . . . , xt)- Then 



' disc^(/) 
^Xi\■■■\Xk\ 



E 



E 



E 

XkeXk 



W (A(x^...,xf_-;,x,) 



ze|0,l) 
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In the case of k - 2 parties, there are other ways to estimate the discrepancy, e.g., 
using the spectral norm of a matrix. 

For a function / : Xi x ■ ■ ■ x {0, 1) and a distribution ju over X\ x 

■ ■ ■ X Xk, let D^^^if) denote the least cost of a deterministic protocol for / whose 
probability of error with respect to ^ is at most e. This quantity is known as the 
ji-distributional complexity of /. Since a randomized protocol can be viewed as 
a probability distribution over deterministic protocols, we immediately have that 
R-'lif) > maxp D^^^iyf). We are now ready to state the discrepancy method. 

Theorem 2.3 (Discrepancy method; see IIKN971 ). For every f : Xi x ■ ■ ■ x Xk ^ 

{0, 1), every distribution /j. on X[ X • ■ • X Xk, and every y € (0, 1], 

<2^r/2><./2(/)^log,^. 

In other words, a function with small discrepancy is hard to compute to any non- 
trivial advantage over random guessing (let alone compute it to high accuracy). In 
the case of ^ = 2 parties, discrepancy yields analogous lower bounds even in the 
quantum model, regardless of prior entanglement IIKre95 1 iKlaO 1 1 ILS 07bll . 

3 The Degree/Discrepancy Theorem 

This section presents the author's Degree/Discrepancy Theorem, whose proof tech- 
nique is the foundation for much of the subsequent work surveyed in this arti- 
cle IIShe07b[IOia07llLM7llCA08llDP08]l . 

The original motivation behind this result came from circuit complexity. A nat- 
ural and well-studied computational model is that of a polynomial-size circuit of 
majority gates. Research has shown that majority circuits of depth 2 and 3 already 
possess surprising computational power. Indeed, it is a long-standing open prob- 
lem [KP97 | to exhibit a Boolean function that cannot be computed by a depth-3 
majority circuit of polynomial size. 

Another extensively studied model is that of polynomial-size constant-depth 
circuits with and, or, not gates, denoted by AC''. Allender's classic result IIA1189II 
states that every function in AC" can be computed by a depth-3 majority circuit of 
quasipolynomial size. Krause and Pudlak IIKP97I §6] ask whether this simulation 
can be improved, i.e., whether every function in AC'' can be computed by a depth- 
2 majority circuit of quasipolynomial size. We recently gave a strong negative 
answer to this question: 
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Theorem 3.1 ( PShe07all ). There is a function F : |0, 1)" {0, 1), explicitly given 
and computable by an AC" circuit of depth 3, whose computation requires a ma- 
jority vote o/exp(n(?i^^^)) threshold gates. 

We proved Theorem 13.11 by exhibiting an AC" function with exponentially small 
discrepancy. All previously known functions with exponentially small discrep- 
ancy (e.g., ||GHR92[ |Nis93l ) contained parity or majority as a subfunction and 
therefore could not be computed in AC°. Buhrman et al. IIBVW07II obtained, inde- 
pendently of the author and with much different techniques, another AC" function 
with exponentially small discrepancy, thereby also answering Krause and Pudlak's 
question. 

3.1 Bounding the Discrepancy via the Threshold Degree 

To construct an AC" function with small discrepancy, we developed in IIShe07all a 
novel technique for generating low-discrepancy functions, which we now describe. 
This technique is not specialized in any way to AC" but, rather, is based on the 
abstract notion of threshold degree. 

For a Boolean function / : {0, 1)" {0, 1), recall from Section [T]that its 
threshold degree deg+(/) is the minimum degree of a polynomial p{x\, . . . ,Xn) 
with p{x) > o f{x) = 1 and p{x) < o f{x) = 0. In many cases [MP881, 
it is straightforward to obtain strong lower bounds on the threshold degree. Since 
the threshold degree is a measure of the complexity of a given Boolean function, 
it is natural to wonder whether it can yield lower bounds on communication in a 
suitable setting. As we prove in liShe07al . this intuition turns out to be correct for 
every /. 

More precisely, fix a Boolean function / : {0, 1)" — > {0, 1) with threshold 
degree d. Let A'^ be a given integer, ^ «. In IIShe07all . we introduced the two- 
party communication problem of computing 

f(x\v), 

where the Boolean string x e {0, 1)^ is Alice's input and the set V (Z {1,2, ... ,N} 
of size \V\ = n is Bob's input. The symbol x\v stands for the projection of x onto 
the indices in V, in other words, x\v = (xj^ , x,-^, . . . , xij € {0, 1}", where ii < i2 < 
• • • < /„ are the elements of V. Intuitively, this problem models a situation when 
Alice and Bob's joint computation depends on only n of the inputs xi,X2,..., xpf. 
Alice knows the values of all the inputs x\,X2, . . . , x^ but does not know which n 
of them are relevant. Bob, on the other hand, knows which n inputs are relevant 
but does not know their values. As one would hope, it turns out that J is a lower 
bound on the communication requirements of this problem: 
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Theorem 3.2 (Degree/Discrepancy Theorem IIShe07al ). Let f : {0, 1}" ^ |0, 1) 

be given with threshold degree d ^ I. Let N be a given integer, N ^ n. Define F - 
[/(■^|y)].v,y> where the rows are indexed by x e {0, 1}^ and columns by V & y^j- 




To our knowledge, Theorem 13.21 is the first use of the threshold degree to prove 
communication lower bounds. Given a function / with threshold degree d, The- 
orem 13.21 generates a communication problem with discrepancy at most 2''^ (by 
setting N > I6m^ /d). This exponentially small discrepancy immediately gives an 
Q.(d) lower bound on communication in a variety of models (deterministic, nonde- 
terministic, randomized, quantum with and without entanglement; see Section l2!2l) . 
Moreover, the resulting lower bounds on communication remain valid when Alice 
and Bob merely seek to predict the answer with nonnegligible advantage, a critical 
aspect for lower bounds against threshold circuits. 

We will give a detailed proof of the Degree/Discrepancy Theorem in the next 
subsection. For now we will briefly sketch how we used it in fSheOVa] to prove the 
main result of that paper. Theorem 13.11 above, on the existence of an AC" function 
that requires a depth-2 majority circuit of exponential size. Consider the function 



for which Minsky and Papert IIMP88II showed that deg+(/) = m. Since / has high 
threshold degree, an application of Theorem l3.2l to / yields a communication prob- 
lem with low discrepancy. This communication problem itself can be viewed as an 
AC" circuit of depth 3. Recalling that its discrepancy is exponentially small, we 
conclude that it cannot be computed by a depth-2 majority circuit of subexponen- 
tial size. 

3.2 Proof of the Degree/Discrepancy Theorem 

A key ingredient in our proof is the following dual characterization of the threshold 
degree, which is a classical result known in greater generahty as Gordan's Trans- 
position Theorem IISch98l §7.8]: 

Theorem 3.3. Let f : {0, 1}" {0, 1) be arbitrary, d a nonnegative integer. Then 
exactly one of the following holds: (1) / has threshold degree at most d; (2) there is 
a distribution yu over {0, 1}" such that £.,./,[(- l/W;if5(x)] = Ofor \S\ = 0, 1, . . . ,d. 




Then 




f{x) = \/ f\xij. 
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Theorem 13.31 follows from linear-programming duality. We will also make the 
following simple observation. 

Observation 3.4. Let k{x) be a probability distribution on {0, l)*^. Fix i\,. . .,ir 
€ {1,2, ... ,r}. Then Yjxe{0,iy '<^(-^ip ■ • ■ > ^i,) ^ 2''""'' where \{ii, . . ., jV)| denotes 
the number of distinct integers among ii, . . . , ir. 

We are now ready for the proof of the Degree/Discrepancy Theorem. 

Theorem 13.21 (Restated from p.fTOl). Let f : {0, 1}" {0, 1) be given with thresh- 
old degree d ^ I. Let N be a given integer, N ^ n. Define F = [f{x\v)]x,v, where 
the rows are indexed by x e {0, 1)^ and columns by V e ('^^)- Then 

Proof MSheOTal . Let be a probability distribution over {0, 1)" with respect to 
which E,.^[(-l/(^V(z)] = Ofor every real-valued function /:) of J - 1 or fewer 
of the variables zi, . . . ,z„. The existence of /j is assured by Theorem 13.31 We will 
analyze the discrepancy of F with respect to the distribution 

^1 



A{x,V) = 2-''^'t \ fi{x\v). 



Define i/^ : {0, 1)" ^ Rby (A(z) = (-l/(^V(z)- By Theorem[ 

disc^(F)2 <4" E |r(V,W)|, (3.1) 

v.w 

where we put r(V', W) = Ejf[il/{x\v)il/(x\w)]- To analyze this expression, we prove 
two key claims. 

Claim 3.5. Assume that \Vr)W\^d-\. Then TiV, W) - 0. 

Proof. The claim is immediate from the fact that the Fourier transform of ij/ is 
supported on characters of order d and higher. For completeness, we will now give 
a more detailed and elementary explanation. Assume for notational convenience 
that V = {1,2,..., «). Then 

nV, W) - ElMixu . . . ,x„)(-l/to--'^"V(x|w)] 



^ 2] fl{Xu...,Xn)(-lf""^ 2] ilf(x\w) 

X\,...,X„ x„+i,...,xn 



1 

— E 

2" {Xl,...,X„)~fl 



■,x„) 



VJC„+1,...,A-A/ 
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Since \V n W\ d - I, the starred expression is a real- valued function of at most 
d - I variables. The claim follows by the definition of /j. □ 



Claim 3.6. Assume that \Vf^W\ = i. Then \Y{V, W)\ < 2'-2". 

Proof. The claim is immediate from Observation 13.41 For completeness, we will 
give a more detailed explanation. For notational convenience, assume that 

V = {\,2,...,n}, 

W = . . . {n + l,n + 1, . . . ,n + {n - /)). 

We have: 

\nv, w)\ < nw{x\v)iij{x\w)w 

X 

E [jU(Xi,...,X„)//(Xi,...,X;,X„+i,...,X2„-i)] 

Xl,...,X2n-i 

< E \ji{xi,...,Xn)\ ■ max E [ju(xi, . . . , x^;, x„+i, . . . , X2„-,-)] ■ 

X\,...,X„ Xl,...,Xj X„+\,...,X2n-i 

=2-" <2-<"-') 

The bounds 2"" and 2"^"~'^ follow because // is a probability distribution. □ 
In view of Claims [375] and |3.6[ inequality ( 13.11 ) simplifies to 

n 

disCi(F)2 < 2] 2' P[|y nW\ = i], 

i=d 

which completes the proof of Theorem 13 .21 after some routine calculations. □ 

The discrepancy bound in Theorem 13.21 is not tight. In follow-up work (see 
Section lSTT]) . the author proved a substantially stronger bound using matrix-analytic 
techniques. However, that matrix-analytic approach does not seem to extend to the 
multiparty model, and as we will see later in Sections [6]and|71 all multiparty papers 
in this survey use adaptations of the analysis just presented. 



4 The Generalized Discrepancy Method 

As we saw in Section 12.21 the discrepancy method is particularly strong in that 
it gives communication lower bounds not only for bounded-error protocols but 
also for protocols with error vanishingly close to ^. Ironically, this strength of the 
discrepancy method is also its weakness. For example, the disjointness function 
Disj(x, r/) = V"=ite ^ yi) has a simple low-cost protocol with error j - ^^(^)- 
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As a result, disjointness has high discrepancy, and no useful lower bounds can 
be obtained for it via the discrepancy method. Yet it is well-known that dis- 
jointness has bounded-error communication complexity Q.{n) in the randomized 
model ||KS92irRaz92 | and Q(V«) in the quantum model IIRaz03ll . 

The remainder of this survey (Sections [Sj-Q is concerned with bounded- 
error communication. Crucial to this development is the generalized discrep- 
ancy method, an ingenious extension of the traditional discrepancy method that 
avoids the difficulty just cited. To our knowledge, this idea originated in a 
paper by Klauck [KlaOT] Thm. 4] and was reformulated in its current form by 
Razborov IIRaz031 . The development in IIKlaOlll and IIRaz03ll takes place in the 
quantum model of communication. However, the basic mathematical technique is 
in no way restricted to the quantum model, and we will focus here on a model- 
independent version of the generalized discrepancy method from IIShe07b[ §2.4]. 

Specifically, consider an arbitrary communication model and let / : Z x F — > 
{0, 1} be a given function whose communication complexity we wish to estimate. 
Suppose we can find a function h : X x Y ^ {0,1} and a distribution jj. onXxY 
that satisfy the following two properties. 

1. Correlation of / and h. The functions / and h are well correlated under /i: 

where e > is typically a constant. 

2. Hardness of h. No low-cost protocol IT in the given model of communication 
can compute h to a substantial advantage under /i. Formally, if IT is a protocol 
in the given model with cost C, then 

E [(-l)''(^'y)E[(-l)n(^'^)ll < 2'^(^V, (4.2) 

where y = o{\). The inner expectation in (14.21) is over the internal operation 
of the protocol on the fixed input {x, y). 

If the above two conditions hold, we claim that any protocol in the given model that 
computes / with error at most 6/3 on each input must have cost O ^log ^ . Indeed, 
let n be a protocol with V\YV{x,y) + /(x, r/)] < e/3 for all x,y. Then standard 
manipulations reveal: 

E [(-l)''(-^'f>E[(-l)n(-"'^)ll ^ E [(-l)/('^-y)+'<-^-?')l-2- - 9 -. 
In view of ( 14.21 ). this shows that IT must have cost Q (log -] . 
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The above framework from ISheOTbl is meant to emphasize the basic mathe- 
matical technique in question, which is independent of the communication model. 
Indeed, the communication model enters the picture only in (14.21 ). It is here that 
the analysis must exploit the particularities of the model. To place an upper bound 
on the advantage under /i in the quantum model with entanglement, one considers 
the quantity ||/:||V|X| where K = [(-l)''('^'fV(-^> y)]x,y In the randomized model 
and the quantum model without entanglement, the quantity to estimate happens to 
be disc^(/j). (In fact, Linial and Shraibman |LS07bJ recently showed that discp(/j) 
also works in the quantum model with entanglement.) 

For future reference, we now record a quantitative version of the generalized 
discrepancy method for the quantum model. 

Theorem 4.1 (|She07b |, implicit in IIRaz031 ISZOTi ). Let X, Y be finite sets and 
f : X X Y ^ {0, 1) a given function. Let K = [Kjaj]xex,yeY be any real matrix with 
\\K\\\ - 1. Then for each e > 0, 



4G.(/) > 4e'(/) > (F'K)-2e 
^ 3\\K\\^mW\^ 



where F ^ [(-l)-^(^'f)l 

L Jji 



ixeX,ye¥ 

Observe that Theorem l4.1l uses slightly more succinct notation (matrix vs. function; 
weighted sum vs. expectation) but is equivalent to the abstract formulation above. 

So far, we have focused on two-paity communication. This discussion extends 
essentially word-for-word to the multiparty model, with discrepancy serving once 
again as the natural measure of the advantage attainable by low-cost protocols. 
This extension was formalized by Lee and Shraibman IILS07[ Thms. 6, 7] and in- 
dependently by Chattopadhyay and Ada liCAOSl Lem. 3.2], who proved (14.31 ) and 
(14.41 ) below, respectively: 

Theorem 4.2 (cf. IILS071ICA081 ). FixF : XjX- • -xX^ {-1, +1} andee [0, 1/2). 
Then 

( (H o P, F) - -r^e] 

2«*(^) ^ (1 - 6) max I — — ^ \ (4.3) 

^ ' H,p\ discp{H) I ^ ^ 



.■naxi<"°^-''>-^n. (4.4) 



and 



H,p [ disc p{H) 

where in both cases H ranges over sign tensors and P ranges over tensors with 
P ^Oand\\P\\i = 1. 
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Proof. Fix an optimal 6-error protocol IT for F. Define F{x\,...,Xk) - 
E[(-l)'^^'^'" "^*-'], where the expectation is over any internal randomization in n. 
Let (5 e (0, 1] be a parameter to be fixed later. Then 

2^^*^^^ discp(//) >{HoP, F) 

= sl^iH oP, F} + (ho P, ^F - 

^si^iHoP, F)-^ max{|l -6- 2e\, 1 - <5}| . 

where the first inequality restates the original discrepancy method (Theorem 12 .3 1 ). 
Now (14.31 ) and (14.41) follow by setting 6 = I - e and 6=1, respectively. □ 

The proof in IICA08I is similar to the one just given for the special case 6=1. 
The proof in IILS07I is rather different and works by defining a suitable norm and 
passing to its dual. The norm-based approach was employed earlier by Linial and 
Shraibman MLSOTbl and can be thought of as a purely analytic analogue of the 
generalized discrepancy method. 



5 Two-Party Bounded-Error Communication 

For a function / : |0, 1)" — > R, recall from Section [T] that its e-approximate degree 
degp(/) is the least degree of a polynomial pix\, . . . , x„) with \ f{x) - p{x)\ < e for 
all X € {0, 1)". We move on to discuss two recent papers on bounded-error com- 
munication that use the notion of approximate degree to contribute strong lower 
bounds for rather broad classes of functions, subsuming Razborov's breakthrough 
work on symmetric predicates f Raz031 . These lower bounds are valid not only in 
the randomized model, but also in the quantum model (regardless of entanglement). 

The setting in which to view these two works is Klauck and Razborov's gen- 
eralized discrepancy method (see Sections [T] and |4l). Let F be a sign matrix whose 
bounded-error quantum communication complexity is of interest. The quantum 
version of this method (Theorem 14.11) states that to prove a communication lower 
bound for F, it suffices to exhibit a real matrix K such that {F, K) is large but ||^|| 
is small. The importance of the generalized discrepancy method is that it makes 
it possible, in theory, to prove lower bounds for functions such as disjointness, to 
which the traditional discrepancy method (Theorem 12. 31 ) does not apply. 

The hard part, of course, is finding the matrix K. Except in rather restricted 
cases BKlaOU Thm. 4], it was not known how to do it. As a result, the general- 
ized discrepancy method was of limited practical use. (In particular, Razborov's 
celebrated work IIRaz031 did not use the generalized discrepancy method. Instead, 
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he introduced a novel alternate technique that was restricted to symmetric func- 
tions.) This difficulty was overcome independently by Sherstov USheOTbll and Shi 
and Zhu fSZOV], who used the dual characterization of the approximate degree to 
obtain the matrix K for a broad range of problems. To our knowledge, the work 
in USheOTbll and IISZ07II is the first use of the dual characterization of the approxi- 
mate degree to prove communication lower bounds. 

The specifics of these two works are very different. The construction of K 
in MSheOTbl . which we called the pattern matrix method for lower bounds on 
bounded-error communication, is built around a new matrix-analytic technique 
(the pattern matrix) inspired by the author's Degree/Discrepancy Theorem. The 
construction of K in IISZ07I . the block-composition method, is based on the idea 
of hardness amplification by composition. What unites them is use of the dual 
characterization of the approximate degree, given by the following theorem. 

Theorem 5.1 ( IIShe07bl ISZOTi ). Fix e > 0. Let f : {0, 1}" ^ R be given with 
d = deg^(f) > 1. Then there is a function iff : {0, 1}" ^ R such that: 

iA(5)-0 for\S\<d, 

I'A(z)I = i, 

ze{OJ)" 
2 ijy{z)f{z) > e. 

Z6{0,1)" 

Theorem 15.11 follows from linear-programming duality. We shall first cover the 
two papers individually in Sections 15.11 and 15.21 and then compare them in detail 
in Section [531 

5.1 The Pattern Matrix Method 

The setting for this work resembles that of the Degree/Discrepancy Theorem 
in IIShe07al (see Section [S]). Let A'^ and n be positive integers, where n < N/2. 
For convenience, we will further assume that n \ N. Fix an arbitrary function 
/ : {0, 1)" ^ |0, 1). Consider the communication problem of computing 

f{x\v), 

where the bit string x € {0, l}'^ is Alice's input and the set V cz {1,2, ... ,N} with 
\V\ = « is Bob's input. As before, x\v denotes the projection of x onto the indices 
in V, i.e., x\v = (x,, , .x,-,, . . . , e {0, 1)" where /i < /2 < • • • < /„ are the elements 
of y. 
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The similarities with IIShe07al . however, do not extend beyond this point. Un- 
like that earlier work, we will actually study the easier communication problem 
in which Bob's input V is restricted to a rather special form. Namely, we will 
only allow those sets V that contain precisely one element from each block in the 
following partition of {1,2, ... ,N]: 

h2....E\J^,,....m....J^JlzM,u....M\. (5.,, 

I n ) [n n ) l'^ J 

Even for this easier communication problem, we will prove a much stronger re- 
sult than what would have been possible in the original setting with the methods 
of IIShe07all . In particular, we will considerably improve the Degree/Discrepancy 
Theorem from IIShe07all along the way. The main results of this work are as fol- 
lows. 

Theorem 5.2 ( llShe07bl ). Any classical or quantum protocol, with or without prior 
entanglement, that computes f{x\v) with error probability at most 1/5 on each 
input has communication cost at least 

2 degi/3(/) • log 




In view of the restricted form of Bob's inputs, we can restate Theorem 15.21 in 
terms of function composition. Setting N = 4n for concreteness, we have: 

Corollary 5.3 (|She07b|). Let f : {0, 1)" ^ {0, 1) be given. Define F : {0, x 
{0,ir ^ {0,1} by 

F{x,y) = xiyi V X2y2 V xjyj V x^y^ , 
xsys V X(,y6 V xjyj V x^ys , 



X4n-3y4n-3 V X4„_2?/4«-2 V X4„_iZ/4,j-i V X4„i/4„ j, 

where xiyi = (x,- A ;/,). Any classical or quantum protocol, with or without prior 
entanglement, that computes F{x,y) with error probability at most 1/5 on each 
input has cost at least | degj/3(/) - 2. 

We now turn to the proof. Let 'V{N,n) denote the set of Bob's inputs, i.e., the 
family of subsets V c \N] that have exactly one element in each of the blocks of the 
partition (15.11 ). Clearly, \'V{N,n)\ = {N In)". We will be working with the following 
family of matrices. 
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Definition 5.4 (Pattern matrix IIShe07bll ). For (p : |0, 1)" R, the {N,n,^)- 
pattem matrix is the real matrix A given by 

A = \d>ix\v ® w)l „ 

J.Ye{0,l|",(K"')e^(A','i)x{0,l|" 

In words, A is the matrix of size 2^ by 2'^{N/n)" whose rows are indexed by 
strings x € {0, 1)^, whose columns are indexed by pairs (V, w) € 'V(N, n) x {0, 1)", 
and whose entries are given by Axxv,w) - (piAv ® uf). The logic behind the term 
"pattern matrix" is as follows: a mosaic arises from repetitions of a pattern in the 
same way that A arises from applications of to various subsets of the variables. 

Our intermediate goal will be to determine the spectral norm of any given pat- 
tern matrix A. Toward that end, we will actually end up determining every singular 
value of A and its multiplicity. Our approach will be to represent A as the sum of 
simpler matrices and analyze them instead. For this to work, we need to be able to 
reconstruct the singular values of A from those of the simpler matrices. Just when 
this can be done is the subject of the following lemma from ISheOTbJ. 

Lemma 5.5 (Singular values of a matrix sum f She07bll ). Let A, B be real ma- 
trices with AB^ = and A^B = 0. Then the nonzero singular values of A + B, 
counting multiplicities, are cri(A), . . . , o-rankAiA), cri(B), . . . , o-^^nkBiB). 

We are ready to analyze the singular values of a pattern matrix. 

Theorem 5.6 (Singular values of a pattern matrix f She07bl ). Let : {0, 1)" ^ 

R be given. Let A be the (N, n, (p)-pattern matrix. Then the nonzero singular values 
of A, counting multiplicities, are: 

[J ] • ' repeated y-^ times 

5:0(5 )5tO I 

In particular. 

Proof IIShe07bi . For each S c [«], let As be the (A/^, ?^,;^f5• )-pattern matrix. Then 
A - YjSQ[n\^i^)^s ■ For any S,T c \n\ with S T, a calculation reveals that 
AsAj = and AJAj = 0. By Lemma [531 this means that the nonzero singular 
values of A are the union of the nonzero singular values of all 0(5 )As , counting 
multiplicities. Therefore, the proof will be complete once we show that the only 
nonzero singular value of aJAs is 2^^'\N ln)"^^^\ with multiplicity {N/n)^^K 
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For this, it is convenient to write A^As as the Kronecker product 

xem\'^ Ivy 

The first matrix in this factorization has rank 1 and entries +1, which means that 
its only nonzero singular value is 2" with multiplicity 1. The other matrix, call 
it M, is permutation-similar to 2^ diag(7, J,...,J), where J is the all-ones square 
matrix of order {N/n)"~^^K This means that the only nonzero singular value of M 
is 2^(N/n)"~^^^ with multiplicity {N/n)^^K It follows from elementary properties of 
the Kronecker product that the spectrum of AJAs is as desired. □ 

We are now prepared to formulate and prove the pattern matrix method for 
lower bounds on bounded-error communication, which gives strong lower bounds 
for every pattern matrix generated by a Boolean function with high approximate 
degree. Theorem l5.2l and its corollary will fall out readily as consequences. 

Theorem 5.7 (Pattern matrix method IIShe07bl ). Let F be the {N, n, f)-pattem 
matrix, where f : {0, 1)" {0, 1) is given. Put d = degip(f). Then 



a] As = [Xsiw)xsiw')]w,w' ® 



Proof nSheOTbl Define /* : {0, 1}" ^ {-1, +1} by /*(z) = (-l/(^\ Then it is easy 

to verify that deg2/3(/*) = d. By Theorem 15. 11 there is a function ^ : |0, 1)" — > R 
such that: 

iA(5)-0 for 15 1 <t/, (5.2) 

2 l'Aa)l = l, (5.3) 

Z6{0,1)" 

2 ^{z)f{z)>\. (5.4) 

Z6{0,1|" 

Let M be the (A^, /* )-pattern matrix. Let K be the (A^, 2^^(A^/?i)^"i/r)-pattern 
matrix. Immediate consequences of (15.31 ) and (15.41) are: 

\\K\\, = \, {K,M)>\. (5.5) 
Our last task is to calculate ||^||. By ( 15.31 ) and Proposition 12. 1 [ 

max|i^(5)| < T". (5.6) 
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Theorem |5^ yields . in view of (15.21 ) and (15.61) : 




(5.7) 



The desired lower bounds on quantum communication now follow directly from 
(15.51 ) and (15.71 ) by the generalized discrepancy method (Theorem 14. lb . □ 

Remark 5.8. In the proof of Theorem 15.71 we bounded \\K\\ using the subtle cal- 
culations of the spectrum of a pattern matrix. Another possibility would be to 
bound \\K\\ precisely in the same way that we bounded the discrepancy in the De- 
gree/Discrepancy Theorem (see Section[3]). This, however, would result in polyno- 
mially weaker lower bounds on communication. 

Theorem 15.71 immediately imphes Theorem l5.2l above and its corollary: 

Proof of Theorem 15^ IIShe07bL The ([^^J «, «, /^-pattern matrix occurs as a sub- 
matrix of UiAvYixem)^ ,Ve'V(N,ny ° 

Improved Degree/Discrepancy Theorem. We will mention a few more appli- 
cations of this work. The first of these is an improved version of the author's 
Degree/Discrepancy Theorem (Theorem l3.2l ). 

Theorem 5.9 ( ||She07bll ). Let F be the {N,n, f)-pattem matrix, where f : 
{0, 1)" ^ {0, 1) has threshold degree d. Then disc(F) < (n/N)'"^. 

The proof is similar to the proof of the pattern matrix method. Theorem 15.91 im- 
proves considerably on the original Degree/Discrepancy Theorem. To illustrate, 
consider f(x) = \/"=i A"Li ^ij^ ^ function on n = m variables. Applying Theo- 
rem |5.9| to / leads to an exp(-0(«'^^)) upper bound on the discrepancy of AC°, im- 
proving on the previous bound of exp(-0(?i ^ /5 )) from IIShe07all . The exp(-0(?i ^ )) 
bound is also the bound obtained by Buhrman et al. IIBVW07I1 independently of the 
author IIShe07al IShe07bll . using a different function and different techniques. 



Razborov's Lower Bounds for Symmetric Functions. As another application, 
we are able to give an alternate proof of Razborov's breakthrough result on the 
quantum communication complexity of symmetric functions IIRaz03l . Consider a 
communication problem in which Alice has a string x € {0, 1}", Bob has a string 
y € {0, 1)", and their objective is to compute 

Di\x A y\) 
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for some predicate D : {0, 1, . . . , n} ^ {0, 1) fixed in advance. This general setting 
encompasses several familiar functions, such as disjointness (determining if x and 
y intersect) and inner product modulo 2 (determining if x and y intersect in an odd 
number of positions). 

As it turns out, the hardness of this general communication problem depends on 
whether D changes value close to the middle of the range |0, 1, . . . , «). Specifically, 
define 4(D) € {0, 1, ... , L«/2J) and tiiD) € {0, 1, ... , In/I]} to be the smallest 
integers such that D is constant in the range [£o{D), n-£\{D)]. Razborov established 
optimal lower bounds on the quantum communication complexity of every function 
of the form D{\x A y\): 

Theorem 5.10 (Razborov IIRaz031 ). Let D : {0, 1, . . . ,n} ^ {0,1} be an arbitrary 
predicate. Put f{x,y) - D{\x A y\). Then 

Qipif) > Qlpif) > n{^JnUD) + iiiD)) . 



In particular, disjoesitness has quantum communication complexity Q.(^Jn), regard- 
less of entanglement. Prior to Razborov 's result, the best lower bound IIBWOli 
IASTS"''03ll for disjointness was only n(log n). 

In IIShe07bi . we give a new proof of Razborov's Theorem 15.101 using a straight- 
forward application of the pattern matrix method. 

5.2 The Block Composition Method 

Given functions / : {0, 1)" ^ |0, 1) and g : {0, 1)*^ x {0, 1)*^ ^ {0, 1}, let fog" 
denote the composition of / with n independent copies of g. More formally, the 
function f o g" : {0, 1}"^ x {0, 1)"*^ ^ {0, 1) is given by 

(fog")ix,y)=f(...,g(x^'\ /''>),...), 

where x - (. . . , x'f>, . . . ) € {0, 1)"*^ and y = {..., y^'\ . . . ) e {0, 1}"*^. 

This section presents Shi and Zhu's block composition method IISZ07L which 
gives a lower bound on the communication complexity of / o g" in terms of certain 
properties of / and g. The relevant property of / is simply its approximate degree. 
The relevant property of g is its spectral discrepancy, formalized next. 

Definition 5.11 (Spectral discrepancy lISZOTi ). Given g : {0,1}'^ x {0,1}'^ ^ 
{0, 1}, its spectral discrepancy p{g) is the least p > for which there exist sets 
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A, B c |0, 1 and a distribution on A x S such that 



(5.8) 




'-^e^^-ye-e ^ ^J\A\\B\' 



(5.9) 



and 



2] Mx,2/)(-l)^(^'^)=0. 



(5.10) 



(.v,y)eAxB 



In view of ( 15.81 ) alone, the spectral discrepancy p{g) is an upper bound on the 
discrepancy disc(g). The key additional requirement ( 15.91 ) is satisfied, for example, 
by doubly stochastic matrices IIHJ86I §8.7]: if A = B and all row and column sums 
in [u{x,y)\^eA,ijeA are 1/|A|, then \\[u{x,y)\:,eA,ijeA\\ = 1/|A|. 

As an illustration, consider the familiar function inner product modulo 2, given 
by iPyt(x,i/) = ©*Lj(.x,- Ai/,). 

Proposition 5.12 (|SZ()3). The function iPk has p{iPk) < 1/V2*^ - 1. 

Proof IISZ07i Take ^ to be the uniform distribution over A x S, where A = {0, 1 \ 
{O'^} andB - {0, l}'^. □ 

We are prepared to state the general method. 

Theorem 5.13 (Block composition method [SZ07 |). Fix f : {0, 1)" ^ {0, 1) and 

9 : {0, 1}*^ X {0, 1}^ ^ |0, 1). Put d - degi/3(/) and p = p(g). If p < J/(2e?i), then 

Qifog'')^Q*ifog") = Qid). 

Proof (adapted from IISZ07I1 ). Fix sets A, B c {0, 1)*' and a distribution ju on Ax B 
with respect to which p = p{g) is achieved. Define /* : {0, 1}" {-1,+1} by 
f*{z) - (-1)^*^^^ Then one readily verifies that deg2/3(/*) - d. By Theorem |5. 11 
there exists if/ : {0, 1)" R such that 



iA(5)-0 



for \S\<d, 



(5.11) 




(5.12) 



ze(0,l)' 



zemy 




(5.13) 
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Define matrices 



K 



1=1 



X,IJ 



where in both cases the row index x = {. . . , x^'^, . . .) ranges over A" and the column 
index y - {. . . , y^'\ . . .) ranges over B". In view of (15.101) and (15.131) . 



(5.14) 



We proceed to bound \\K\\. Put 



S c {n\. 



x,y 



-ieS i=l 

Then ( 15.81 ) and ( 15.91 ) imply, in view of the tensor structure of Ms , that 

On the other hand, 

\\K\\^ 2] 2«|,A(5)II|M5|| 



(5.15) 



SQln] 

J]2''\iP{S)\\\Ms\ 



\S\>cl 



2 \\Ms\\ 

\S\>d 



bv (l5TT]) 

by (15.12b and Proposition 12. II 
by (I5l5l) . 



Since p < d/{2en), we further have 

||/:|| < \A\-"'^ \B\-"'^ 2"®(^\ 



(5.16) 



In view of ( 15.141 ) and ( 15.161 ). the desired lower bound on Q*(F) now follows by the 
generalized discrepancy method (Theorem 14. II) . □ 

Proposition 15.121 and Theorem [5713] have the following consequence: 
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Theorem 5.14 ( IISZ071 ). Fix a function / : |0, 1)" ^ {0, 1) and an integer k > 
21og2 n + 5. Then Q{f o ip«) > Q*{f o ,p«) > Q(degi/3(/)). 

For the disjointness function DisJi(x, y) - Vf=i(^i ^ Ui)' Shi and Zhu prove that 
p{Dis3k) = 0(1 /k). Unlike Proposition 15.121 this fact requires a nontrivial proof 
using Knuth's calculation of the eigenvalues of certain combinatorial matrices. In 
conjunction with Theorem l5.13[ this upper bound on p{Dissic) leads with some work 
to the following implication: 

Theorem 5.15 ( lISZOTll ). Define/ : {0, IfxiO, 1}" ^ {0, 1} byf{x,y) - D{\xAy\), 
where D : {0, 1, . . . , n} — > {0, 1} w given. Then 

Qif) > Q*if) > a(n'IHo{Dfl^ + (m) . 



The symbols ^o(^) and i\{D) have their meaning from Section [STTl Theorem 15. 151 
is of course a weaker version of Razborov's celebrated lower bounds for symmetric 
functions (Theorem l5.10l) . obtained with a different proof. 

5.3 Pattern Matrix Method vs. Block Composition Method 

To restate the block composition method, 

deg,/,(/) 

o > Q(degi/3(/)) provided that p{g) < . 

The key player in this method is the quantity p{g), which needs to be small. This 
poses two complications. First, the function g will generally need to depend on 
many variables, from k - 0(log?i) to ^ = n®^^\ which weakens the final lower 
bounds on communication (recall that p{g) > 2"*^ always). For example, the lower 
bounds obtained in | SZ07 ] for symmetric functions are polynomially weaker than 
Razborov's optimal lower bounds (see Theorems l5.15l and l5.10[ respectively). 

A second complication, as Shi and Zhu note, is that "estimating the quantity 
p{g) is unfortunately difficult in general" IISZ07I §4.1]. For example, re-proving 
Razborov's lower bounds reduces to estimating p{g) with g being the disjointness 
function. Shi and Zhu accomplish this using Hahn matrices, an advanced tool that 
is also the centerpiece of Razborov's own proof (Razborov's use of Hahn matrices 
is somewhat more demanding). 

These complications do not arise in the pattern matrix method. For example, it 
implies (by setting N = 2nm Theorem 15.71 ) that 

Q\f o g") ^ Q(degi/3(/)) 
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for any function g : {0, ll'^xlO, l)'^ ^ |0, 1) such that the matrix [g{x, y)]x,y contains 
the following submatrix, up to permutations of rows and columns: 



1 





1 





1 








1 





1 


1 








1 





1 



(5.17) 



To illustrate, one can take g to be 

g{x,y) = xiyi v X2y2 v xjyj v x^y^, 

or 

g(x,y) = X]_y]_y2 V llyiyi V X2yly2 V xiylyi. 

(In particular, the pattern matrix method subsumes Theorem l5.14[ ) To summarize, 
there is a simple function g on only k = 2 variables that works universally for all /. 
This means no technical conditions to check, such as p{g), and no blow-up in the 
number of variables. As a result, in IIShe07bl we are able to re-prove Razborov's 
optimal lower bounds exactly. Moreover, the technical machinery involved is self- 
contained and disjoint from Razborov's proof. 

We have just seen that the pattern matrix method gives strong lower bounds for 
many functions to which the block composition method does not apply. However, 
this does not settle the exact relationship between the scopes of applicability of the 
two methods. Several natural questions arise. If a function ^ : |0, l)'^ x {0, l}'^ ^ 
{0, 1) has spectral discrepancy p{g) < ^, does the matrix {g{x, y)]x,y contain (15.171 ) 
as a submatrix, up to permutations of rows and columns? An affirmative answer 
would mean that the pattern matrix method has a strictly greater scope of applica- 
bility; a negative answer would mean that the block composition method works in 
some situations where the pattern matrix method does not apply. If the answer is 
negative, what can be said for p{g) = o(l) or p{g) - n"®^^^? 

Another intriguing issue concerns multiparty communication. As we will see 
in Section [6l the pattern matrix method extends readily to the multiparty model. 
This extension makes heavy use of the fact that the rows of a pattern matrix are 
applications of the same function to different subsets of the variables. In the gen- 
eral context of block composition (Section [S!2l ). it is unclear how to carry out this 
extension. It is inviting to explore a synthesis of the two methods in the multiparty 
model or another suitable context. 

6 Extensions to the Multiparty Model 

In this section, we present extensions of the Degree/Discrepancy Theorem and of 
the pattern matrix method to the multiparty model. We start with some notation. 
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Fix a function : |0, 1)" — > 1. and an integer with n \ N. Define the A^, n, (p)- 
pattern tensor as the ^-argument function A : {0, ' x[A/^/?i]''x- • - xiNlnY 
R given by A(x, Vi,..., Vk-i) = (pix\vi,...,Vk-i)^ where 



and Vj[i\ denotes the ith element of the n-dimensional vector Vj. (Note that we 
index the string x by viewing it as a fc-dimensional array of nx{N/n)x - ■ - x {N In) - 
n{N/n)'^^^ bits.) This definition generalizes the author's pattern matrices if one 
ignores the © operator (Section ISTI ). 

We are ready for the first result of this section, namely, an extension of the De- 
gree/Discrepancy Theorem (Theorem l3.2l ) to the multiparty model. This extension 
was originally obtained by Chattopadhyay ||Cha071 Lem. 2] for slightly different 
tensors and has since been revisited in one form or another: IILS07I Thm. 19], 
HCAOS l Lem. 4.2]. The proofs of these several versions are quite similar and are in 
close correspondence with the original two-party case. 

Theorem 6.1 ( IICha07i iLSOTl ICAOSl ). Let f : {0,1}" ^ {0,1} be given with 
threshold degree d ^ 1. Let N be a given integer, n \ N. Let F be the {k, N, n, /)- 
pattern tensor IfN > 4en^{k - 1)2^* '/i^, then disc(F) < 2"''''^* 

Proof (adapted from IICha07llLS07llCA08l l As in the proof of the De- 
gree/Discrepancy Theorem, let // be a probability distribution over {0, 1}" 
with respect to which E2~^[(-l)-^^^^/7(z)] - for every real-valued function p 
of (i - 1 or fewer of the variables zi, . . . The existence of p is assured by 
Theorem 13. 3 1 We will analyze the discrepancy of F with respect to the distribution 



x\Vi,...,Vt-l 



def , 



Xn,V,[n],...,Vk-iln]) ^ (0, 1}' 



n 




Define lA : {0, 1)" ^ Rby (//(z) = (-l/(^V(z)- By TheoremO 

disc^(F)2'"' <2"2*"'E|r(V)|, 



(6.1) 



where we put V - (Vf, V/, . . . , V^_^, Vl_^) and 




(t) 
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For a fixed choice of V, define sets 

A-{(/,y°[/],...,y°_i[/]): 1,2,...,«), 

B = {(/, [/], . . . , V^:l [/]) : / - 1, 2, . . . , z e {0, 1)^-' \ {0*=-^). 

Clearly, A and B are the sets of variables featured in the expressions (t) and {%) 
above, respectively. To analyze r(V), we prove two key claims analogous to those 
in the Degree/Discrepancy Theorem. 

Claim 6.2. Assume that \Ar\B\^d-l. Then Y{\) = 0. 

Proof. Immediate from the fact that the Fourier transform of if/ is supported on 
characters of order d and higher. □ 

Claim 6.3. Assume that \AnB\ = i. Then |r(V)| < 2'2*"'-"2*"' . 

Proof. Observation [Ml shows that |r(V)| < 2""^* ' 2"^* ' "I-^ubi Furthermore, it is 
straightforward to verify that |A U B| ^ n2''~'^ - |A n B| 2*'"^ □ 

In view of Claims W2\ and 1631 inequality (16.11 ) simplifies to 

n 

disc^(F)2'"' < 2'2*"' P[|A r\B\ = /]. 

i=d 

It remains to bound P[|AnB| = /]. For a fixed element a, we have V{a e B | a e A] < 
{k - l)n/N by the union bound. Moreover, given two distinct elements a, a' € 
A, the corresponding events a e B and a' € B are independent. Therefore, 

P[|A nB\ = i] < (';) (^^)' , which yields the desired bound on disc^(F). □ 

Remark 6.4. Recall from Section ISTTl that the two-party Degree/Discrepancy The- 
orem was considerably improved in USheOTbll using matrix-analytic techniques. 
Those techniques, however, do not extend to the multiparty model. As a result. 
Theorem 16.11 that we have just presented does not subsume the improved De- 
gree/Discrepancy Theorem (Theorem 15.91 ). 

We now present an adaptation of the pattern matrix method (Theorem 15.71 ) to 
the multiparty model, obtained by Lee and Shraibman IILS07II and independently 
by Chattopadhyay and Ada IICA08II . The proof is closely analogous to the two- 
party case. However, the spectral calculations for pattern matrices do not extend 
to the multiparty model, and one is forced to fall back on the less precise calcula- 
tions introduced in the Degree/Discrepancy Theorem (Theorem l3.2l ). In particular, 
the result we are about to present does not subsume the two-party pattern matrix 
method. 
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Theorem 6.5 ( llLSOTllCAOSi ). Let f : |0, 1)" ^ {0, 1) be given with dtgy^if) = 
<i > 1. Let N be a given integer, n \ N. Let F be the {k,N,n,f)-pattem tensor. If 
N > 4m^{k - 1)22*"' ff^^^ j^k^p^ ^ Q.{dl2^). 

Proof (adapted from llLSOTllCAOSll ). Define/* : {0,1}" ^ {-1, +1) by /*(z) = 
Then it is easy to verify that deg2/3(/*) = d. By Theorem 15.11 there is a 
function i/^ : {0, 1)" ^ 1. such that: 

iA(5)-0 for\S\<d, 

ze{0,l)" 

2 ^{z)nz)>\ (6.2) 

zefOJl" 

Fix a function h : {0,1}" {-1,+1) and a distribution ji on |0, 1)" such 
that ifj{z) = h{x)fi{x). Let H be the A^, /i)-pattem tensor. Let P be the 
(A:,A^,«,2-"<^/")* '+"(A^/«)-"('^-'V)-pattern tensor. Then P is a probabihty distri- 
bution. By (lOl) . 

(HopF*)>^, (6.3) 
where 7^* is the {k, N, n, /*)-pattem tensor. As we saw in the proof of Theorem 16. 11 

discp(H) < 2-^''2* ' . (6.4) 

The theorem now follows by the generalized discrepancy method (Theorem l4.2l ) in 
view of (|631) and (16.41 ). □ 

The authors of IILS07II and IICA08II gave important applications of their work 
to the ^-party randomized communication complexity of disjointness, improving 
it from Q(|-logn) to n^'-^^'^^2~'^^^ \ As a corollary, they separated the multiparty 
communication classes NP^^ and BPP^'^ for k = (l- o(l)) log2 log2 n parties. They 
also obtained new results for Lovasz-Schrijver proof systems, in light of the work 
due to Beame, Pitassi, and Segerlind IIBPS07II . 

7 Separation of NP^" and BPP^'^ 

We conclude this survey with a separation of NP^^ and BPP^'^ for k = (l- e) log2 n 
parties, due to David and Pitassi [DP08|. This is an exponential improvement over 
the previous separation in [ LS07nCA08il . The crucial insight in this new work is 
to redefine the projection operator x\vi,...,Vk-i from Section [6] using the probabilistic 
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method. This removes the key bottleneck in the previous analyses IILS07I ICA08L 
Unlike the previous work, however, this new approach no longer applies to dis- 

JOINTNESS. 

We start with some notation. Fix integers n, m with n> m. Let i]j : {0, 1}"' ffi. 
be a given function with Yj7e{Q,Y\"' \^{z)\ - 1- Let d denote the least order of a 
nonzero Fourier coefficient of ifj. Fix a Boolean function h : |0, l)™ — > {-1, +1) and 
a distribution // on {0, 1}™ such that i//{z) = h{z)fi{z)- For a mapping a : ({0, l)")*^ 
{^"^y define a. (k + l)-party communication problem Ha : ({0, l}")'^+i 
by H{x,yi, . . . ,yk) = h{^\o{yi,...,yk))- Analogously, define a distribution Aa on 
({0, ir/+i by A{x, y yu) -2-('^+i)«+'«//(xU(,,, ...,,,)). 

Theorem 7.1 ( IIDP08I ). Assume that n ^ 16em^2'^. Then for a uniformly random 
choiceofa:{{0,lYf ^{^f^. 



E[disc,„(//j2n^2-"/2^2-^2*-+i. 

Of L J 



Proof (adapted from IIDP08II ). By Theorem O 



disc^„(//j2* <2'«2*E|r(y)|, (7.1) 



where we put Y - (y^, y\,...,y^, yj^) and 



r{Y) - E 



n •^K(,^^/^...4* 



ze{0,l) 



For a fixed choice of Y, we will use the shorthand S ^ = a{y\' , . . . ,yf)- To ana- 
lyze T(Y), we prove two key claims analogous to those in the Degree/Discrepancy 
Theorem and in Theorem 16. II 

Claim 7.2. Assume that \USz\> ml'' - dl'^'K Then T{Y) - 0. 

Proof. If I IJ 5^1 > ml'^-dl'^"^ , then some must feature more than m-d elements 
that do not occur in Uw^t? •^h- But this forces T{Y) - since the Fourier transform 
of if/ is supported on characters of order d and higher. □ 

Claim 7.3. For every Y, \Y{Y)\ < 2^1 U^d. 

Proof. Immediate from Observation [ 
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In view of (17.11 ) and Claims l7!2l and 1731 we have 



m2*-m 



E[discA^iHaf]^ 2 2' P [y^^l -m2'^-/ 
It remains to bound the probabilities in the last expression. With probability at 
least 1 - k2~" over the choice of Y, the strings i/j , i/j . . . , y^, yj^ will all be distinct. 
Conditioning on this event, the fact that a is chosen uniformly at random means that 
the 2^ sets S ^ are distributed independently and uniformly over ^^'^^ A calculation 
now reveals that 



P 

Y,a 



= ml" - i 



m2' 



n 



< k2-" + 



We are ready to present the separation of NP^'^ and BPP^'^. 

Theorem 7.4 (Separation of NP^' and BPP^'^ llDPOS i). Let k ^ (l - e)log2«, 
where e > is a given constant. Then there exists a function Fa : ({0, l)")*^+i — > 
{-\,+\}withN''^\Fa) - 0(log«) butR^'^\Fa)^n^^^\ 

Proof (adapted from IIDP08II ). Let m = [n^] for a sufficiently small constant ^ = 
^(6) > 0. As usual, define or^ : {0, 1}™ ^ {-1,+1) by or„,(z) = 1 o z - 0™. 
It is known IINS921 |Pat92l that degy3(oR„,) = 0(Vm). As a resufi. Theorem O 
guarantees the existence of a function tA : {0, 1 — > R such that: 



2] l<A(z)l-l, 



for 15 1 < 0(Vm), 



ze{0,l|' 



1 

2^ iA(z)oR„(z) > -. 



ze{0,l)'" 

Fix a function h : {0, l)™ — > {-1,+1) and a distribution jj on {0, 1)™ such that 
(f/(z) = /z(z)ju(z). For a mapping a : ({0, 1}")*^ ^» ^» defined 

at the beginning of this section. Then Theorem 17.11 shows the existence of a such 
that 

disc,i„(//,) < 2-^^^\ 

Using the properties of t^/, one readily verifies that (H o Aa,Fa) > 1/3, where 
Fa : ({0, l}«)'^+i {- 1, + 1) is given by /^tt(-^,J/i, ■■-,?//:) = OR,„(xU(y,,...,yj)). By the 
generalized discrepancy method (Theorem [ 



R^^\Fa) > n(V^) = n^^^K 
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On the other hand, Fa has nondeterministic complexity O(logn). Namely, 
Player 1 (who knows yi,...,yk) nondeterministically selects an element / € 
a{yi, . . . ,yk) and announces /. Player 2 (who knows x) then announces x,- as the 
output of the protocol. □ 

A recent follow-up result due to David, Pitassi, and Viola HDPVOSII derandomizes 
the choice of a in Theorem 17 .41 yielding an explicit separation of NP^' and BPP^'^ 
for A: < (1 - e)log2«. 
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